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Abstract. We demonstrate a method to parallelize the computation of a 
Grobner basis for a homogenous ideal in a multigraded polynomial ring. Our 
method uses anti-chains in the lattice N fc to separate mutually independent 
S-polynomials for reduction. 



1. Introduction 

In this paper we present a way of parallelizing the Buchberger algorithm for 
computing Grobner bases in the special case of multihomogeneous ideals in the 
polynomial algebra over a field. We describe our algorithm as well as our im- 
plementation of it. We also present experimental results on the efficiency of our 
algorithm, using the ideal of commuting matrices as illustration. 

1.1. Motivation. Most algorithms in commutative algebra and algebraic geometry 
at some stage involve computing a Grobner basis for an ideal or module. This 
ubiquity together with the exponential complexity of the Buchberger algorithm 
for computing Grobner bases of homogeneous ideals explains the large interest in 
improvements of the basic algorithm. 

1.2. Prior Work. Several approaches have been tried in the literature. Some 
authors, such as Chakrabarti-Yelick [T] and Vidal [5] have constructed general al- 
gorithms for distributed memory and shared memory machines respectively. Rea- 
sonable speedups were achieved on small numbers of processors. Another approach 
has been using factorization of polynomials; all generated S-polynomials are fac- 
torized on a master node, and the reductions of its factors are carried out on the 
slave nodes. Work by Siegl [3], Bradford [1], Grabe and Lassner Q5]. In a paper by 
Leykin j5] a coarse grained parallelism was studied that was implemented both in 
the commutative and non-commutative case. 

Good surveys of the various approaches can be found in papers by Mityunin and 
Pankratiev [7J and Amrhein, Biindgen and Kuchlin |S|. Mityunin and Pankratiev 
also give a theoretical analysis of and improvements to algorithms known at that 
time. 

Finally, an approach by Reeves |9J parallelizes on the compiler level for modular 
coefficient fields. 

1.3. Our approach. Our approach restricts the class of Grobner bases treated to 
homogenous multigraded Grobner bases. While certainly not general enough to 
handle all interesting cases, the multigraded case covers several interesting exam- 
ples. For these ideals we describe a coarsely grained parallelization of the Buch- 
berger algorithm with promising results. 
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Example 1.1. Set R = k[x\, . . . , x n 2,yx, . . . ,y n i] where k is a field. Let X and 
Y be square n x n-matrices with entries the variables . . . , x n 2 and yx,...,y n 2 
respectively. Then the entries of the matrix 

I n = XY-YX 

form n 2 polynomials generating the ideal I n R. 

The computation of a Grobner basis for I\ and I2 is trivial and may be carried 
out on a blackboard. A Grobner basis for I3 is a matter of a few minutes on most 
modern computer systems, and already the computation of a Grobner basis for Z4 is 
expensive using the standard reverse lexicographic term order in R; the Macaulay2 
system |10| several hours are needed to obtain a Grobner basis with 563 elements. 
However, using clever product orders, Hreinsdottir has been able to find bases with 
293 and 51 elements I12| . As far as we are aware of, a Grobner basis for I 5 is 
not known. 

By assigning multidegrees (1,0) to all the variables X\, . . . ,x n 2 and (0, 1) to all 
the variables yi, ■ • ■ ,y n 2 > the ideal I n becomes multigraded over N x N, and thus 
approachable with our methods. 

Example 1.2. While this paper presents only the approach to multigraded ideals in 
a polynomial ring, an extension to free multigraded modules over multigraded rings 
is easily envisioned, and will be dealt with in later work. 

Grobner bases for such free modules would be instrumental in computing invari- 
ants from applied algebraic topology such as the rank invariant as well as more 
involved normal forms for higher dimensional persistence modules.[\3^ 

2. Partially ordered monoids 

We shall recall some definitions and basic facts about partially ordered sets that 
will be of fundamental use in the remainder of this paper. 

A partially ordered set is a set equipped with a binary, reflective, symmetric and 
transitive order operation <. Two objects a, b such that either a < b or b < a 
are called comparable, and two objects for which neither a < b nor b < a are 
called incomparable. A subset A of a partially ordered set in which all objects are 
mutually incomparable is called an antichain. An element p is minimal if there are 
no distinct q with q < p, maximal if there are no distinct q with p < q. smallest if 
all other q fulfill p < q and largest if all other q fulfill q < p. 

There is a partially ordered set structure on N d in which (ax, ■ • ■ , a>d) < (bx, ■ ■ ■ , ba) 
iff a, < hi for all i. This structure is compatible with the monoid structure on N d 
in the sense that if a < b, then c * a < c *b for a, b, c £ N d . If a monoid has a par- 
tial order compatible with the multiplication in this manner, we call it a partially 
ordered monoid. 

In a partially ordered set P, we say that a subset Q is an ideal if it is downward 
closed, or in other words if for any p £ P,q £ Q such that p < q, then p £ Q. It 
is called a filter if it is upward closed, or if for p £ P, q £ Q such that q < p, then 
p£Q. 

An element p is maximal in an ideal if any element q such that p < q is not 
a member of the ideal. Minimal elements of filters are defined equivalently. An 
ideal (filter) is generated by its maximal (minimal) elements in the sense that the 
membership condition of the ideal (filter) is equivalent to being larger than (smaller 
than) at least one generator. Generators of an ideal or filter form an antichain. 



A PARALLEL BUCHBERGER ALGORITHM FOR MULTIGRADED IDEALS 



3 



Indeed, if these were not an antichain, two of them would be comparable, and 
then one of these two would not be maximal (minimal) . An ideal (filter) is finitely 
generated if it has finitely many generators, and it is principal if it has exactly one 
generator. 

There is a partially ordered monoid structure on N d , given by (pi, ■ ■ ■ ,Pd) < 
(qi, ...,qd) if Pi < Qi for alii, and by (pi, . . . ,p d )*(qi, ...,%) = (pi+<7i, ■ ■ ■ ,Pd+<?d)- 
This structure will be the main partially ordered monoid in us in this paper. 



3. MULTIGRADED RINGS AND THE GRADING LATTICE 

A polynomial ring R = lk[cci, . . . , x r ] over a field Ik is said to be multigraded over 
P if each variable Xj carries a degree \xj\ € P for some partially ordered monoid P. 
We expect of the partial order on P that if p,q £ P then p < p * q and q < p* q. 
The degree extends from variables to entire monomials by requiring \mn\ = \m\*\n\ 
for monomials m, n; and from thence a multigrading of the entire ring R follows 
by decomposing R = ® pg p R p where R p is the set of all homogenous polynomials 
in R of degree p, i.e. polynomials with all monomials of degree p. A homogenous 
polynomial of degree (ni , . . . , rid) is sa id to be of total degree n\ +• • • + n<j. We note 
that for the N d -grading on R, the only monomial with degree (0, 0, ... , 0) is 1, and 
thus the smallest degree is assigned both to the identity of the grading monoid and 
to the identity of the ring. 

We write Imp, ltp, lcp for the leading monomial, leading term and leading 
coefficient of p. 

Proposition 3.1. Suppose p and q are homogenous. If\p\ ^ \q\ then Imp does not 
divide hnq. 

Proof. If lmp| hnq then lmq = clmp, and thus | lmg| = |c| * | hnp|, and thus, since 
|1| < |c|, by our requirement for a partially ordered monoid, Imp < hnq. □ 

Proposition 3.2. Reduction in the Buchberger algorithm of a given multidegree 
for a homogenous generating set depends only on its principal ideal in the partial 
order of degrees. 

Proof. We recall that the reduction of a polynomial p with respect to polynomials 
qi , . . . , qf. is given by computing 

, gcd(lmp, Imqj) 
P =P hnp^ * 



for a polynomial qj such that lmgjjlmp. We note that by Proposition 3.1 this 



implies | lm | < | lmp| and thus < |p|. □ 



We note that Proposition 3.2 implies that if two S-polynomials are incomparable 
to each other, then their reductions against a common generating set are completely 
independent of each other. Furthermore, since \p'\ = \p\, in the notation of the proof 
of Proposition |3.2| a reduction of an incomparable S-polynomial can never have an 
effect on the future reductions of any given S-polynomial. 

Hence, once S-polynomials have been generated, their actual reductions may be 
computed independently across antichains in the partial order of multidegrees, and 
each S-polynomial only has to be reduced against the part of the Grobner basis 
that resides below it in degree. 
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4. Algorithms 

The arguments from Section [3] lead us to an approach to parallelization in which 
we partition the S-polynomials generated by their degrees, pick out a minimal 
antichain, and generate one computational task for each degree in the antichain. 

One good source for minimal antichains, that is guaranteed to produce an an- 
tichain, though most often will produce more tasks than are actually populated 
by S-polynomials is to consider the minimal total degree for an unreduced S- 
polynomial, and produce as tasks the antichain of degrees with the same total 
degree. 

Another, very slightly more computationally intense method is to take all mini- 
mal occupied degrees. These, too, form an antichain by minimality, and are guar- 
anteed to only yield as many tasks as have content. 

Either of these suggestions leads to a master-slave distributed algorithm as de- 
scribed in pseudocode in Algorithms [T] and [2] 

The resulting master node algorithm can be seen in Algorithm[T] and the simpler 
slave node algorithm in Algorithm [2] 

Algorithm 1 Master algorithm for a distributed Grobner basis computation 
loop 

if have waiting degrees and waiting slaves then 

nextdeg 4— pop (waiting degrees) 

nextslave 4— pop(waiting slaves) 

send nextdeg to nextslave 
else if all slaves are working then 

wait for message from slave newslave 

push(newslave , waiting slaves) 
else if no waiting degrees and some working slaves then 

wait for message from slave newslave 

push (newslave , waiting slaves) 
else if no waiting degrees and no working slaves then 

generate new antichain of degrees 

if no such antichain available then 
finish up 

end if 
else 

continue 
end if 
end loop 



5. Experiments 

We have implemented the master-slave system described in Section [4] in Sage 
1 14] , using MPI for Python |15l 116] for distributive computing infrastructure and 
SQLAlchemy [T7] interfacing with a MySQL database [18 for an abstraction of a 
common storage for serialized python objects. 

In order to test our implementation, we have used a computational server running 
8 Intel Xeon processors at 2.83 GHz, with a 5M cache, and a total RAM available 
of 16G. 
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Algorithm 2 Slave algorithm for a distributed Grobner basis computation 
loop 

receive message msg from master 
if msg = finish then 
return 

else if msg = new degree d then 

reduce all S-polynomials in degree d and append to Grobner basis 
compute new S-polynomials based on new basis elements 
send finished degree to master 

end if 
end loop 



We have run test with the Grobner basis problem 23, and recorded total wallclock 
timings, as well as specific timings for the S-polynomial generation and reduction 
steps. The problem was run for each possible number of allocated core (1 to 7 slave 
processors), and the server was the entire time otherwise un-utilized. 

As can be seen in Figure [T] parallelization decreases the wall-clock timings radi- 
cally compared to single-core execution (2 processors, with the slave processor doing 
all work essentially serially). However, the subsequent decrease in computational 
times is less dramatic. 

Looking into specific aspects of the computation, we can see that while the av- 
erage computational times decrease radically with the number of available proces- 
sors, the maximal computation time behaves much worse. With the reduction step, 
maximal computation times still decrease, mostly, with the number of available 
processors. The S-polynomial generation step however displays almost constant 
maximal generation times along the computation. 

Furthermore, compared to the time needed for the algebraic computations, the 
relatively slow, database engine mediated storage and recovery times are almost 
completely negligible. 

These trends are even more clear when we concentrate on only the maximal, 
minimal and average computation times, as in Figure [2] We see a proportional 
decrease in average computation times, and a radical drop-off in minimal compu- 
tation times, which certainly sounds promising. The global behaviour, however, is 
dictated by the maximal thread execution times, which are rather disappointingly 
behaving throughout. 

6. Conclusions and Future Work 

In conclusion, we have demonstrated that while the parallel computation of 
Grobner bases in general is a problem haunted by the ghost of data dependency, 
the lattice structure in an appropriate choice of multigrading will allow for easy 
control of dependencies. Specifically, picking out antichains in the multigrading 
lattice gives a demonstrable parallelizability, that saturates the kind of computing 
equipment that is easily accessible by researchers of today. 

Furthermore, we have developed our methods publically accessible^] and released 
it under the very liberal BSD license. Hence, with the ease of access to our code 



http://code.google.eom/p/dph-mg-grobner, the code used for Section|5]can be found in the 
sage subdirectory 
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FIGURE 1. Timings (seconds) for I 3 on an 8-core computational 
server. Timing runs were made with between 2 and 8 active pro- 
cessors, and the total wallclock times (top left), SQL interaction 
times (top right, the S-polynomial reduction times (bottom left), 
and the S-polynomial generation times (bottom right) were mea- 
sured. 



and to the Sage computing system, we try to set the barrier to build further on our 
work as low as we possibly can. 

However, the techniques we have developed here are somewhat sensitive to the 
distribution of workload over the grading lattice: if certain degrees are dispropor- 
tionately densely populated, then the computational burden of an entire Grobner 
basis is dictated by the essentially serial computation of the highly populated de- 
grees. As such, we suspect these methods to work at their very best in combination 
with other parallelization techniques. 

The Grobner basis implementation used was a rather naive one, and we fully 
expect speed-ups from sophisticated algorithms to combine cleanly with the con- 
structions we use. This is something we expect to examine in future continuation 
of this project. 

There are many places to go from here. We are ourselves interested in investi- 
gating many avenues for the further application of the basic ideas presented here: 
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FIGURE 2. Logarithms of maximal, minimal and average timings 
(seconds) for reduction (left) and generation (right) in the 1% com- 
putations. 



• Adaptation to state-of-the-art Grobner basis techniques for single proces- 
sors. Improve the handling of each separate degree, potentially subdividing 
work even finer. 

• Multigraded free modules, and Grobner bases of these; opening up for the 
use of these methods in computational and applied topology, as a compu- 
tational back bone for multigraded persistence. 

• Multigraded free resolutions; opening up for the application of these meth- 
ods in parallelizing computations in homological algebra. 

• Adaptation to non-commutative cases; in particular to use for ideals in and 
modules over quiver algebras. 

• Building on work by Dotsenko and Khoroshkin, and by Dotsenko and 
Vejdemo- Johansson, there is scope to apply this parallelization to the com- 
putation of Grobner bases for operads. [TH1 I2"U] 
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